-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet to bypass Arrow #3
Parquet to bypass Arrow #3
Conversation
…ide to bypass the Arrow serialization for DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 This is awesome! It will unlock a lot of interesting features we released in the last couple of months 🙌
However there might be some incompatibilities with some of the less commonly used column types. You could potentially test this out by running some of the scripts we use for e2e testing, e.g.: st_data_editor_column_types.py or most of the st_arrow_...
scripts
_LOGGER.info( | ||
"Serialization of dataframe to Arrow table was unsuccessful due to: %s. " | ||
"Serialization of dataframe to Parquet table was unsuccessful due to: %s. " | ||
"Applying automatic fixes for column types to make the dataframe Arrow-compatible.", | ||
ex, | ||
) | ||
df = fix_arrow_incompatible_column_types(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: parquet might have other incompatibilities compared to the arrow serialization, which might be good to add here once we have identified some column types that don't work.
I did some tests, and most of the common data/column types work fine. This includes: string, boolean, integer, float, datetime 👍 This will already cover most use cases. Unfortunately, some of the less common data types don't work: lists, date, time, interval, period. We might be able to get them working with a bit more debugging, or - as a fallback - add them to |
…patibility Improve compatibilities with some column types
…patibility Change logic that handle list values
…patibility Move the decoding above the string parsing
@lukasmasuch Thank you very much for such huge helps! |
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
* Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7), Fix Quiver.ts (#23) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
…, Fix data_frame_to_bytes to catch TypeError in addition to ValueError (#14), Modification for Streamlit 1.27 (#7), Fix Quiver.ts (#23) * Introduce fastparquet on the Python side and parquet-wasm on the JS side to bypass the Arrow serialization for DataFrame * Patch DataEditor not to use PyArrow * Add setTimeout() so the import of parquet-wasm to work in @stlite/mountable * Fix comments * Fix incompatibilities with some column types * Change logic to handle lists from fastparquet * Move the decoding above the string parsing --------- Co-authored-by: lukasmasuch <lukas.masuch@gmail.com>
No description provided.